Video super-resolution processing method and apparatus

ABSTRACT

Embodiments of this application disclose a video super-resolution processing method. The method includes: obtaining encoded information of any coding block in a video stream; determining an inter-frame prediction mode of the any coding block based on an inter-frame prediction marker included in the encoded information; determining a super-resolution pixel block of the any coding block based on the inter-frame prediction mode of the any coding block and pixel information of a matched coding block identified by a matched coding block index included in the encoded information; and stitching super-resolution pixel blocks of all coding blocks that belong to a same image frame in the video stream to obtain a super-resolution image. Power consumption can be reduced while an effect of super-resolution processing performed on a single frame of image in a video is ensured, and super-resolution processing delays of any two frames of images in the video can be shortened.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of International Application No.PCT/CN2020/108814, filed on Aug. 13, 2020, which claims priority toChinese Patent Application No. 201910805436.0, filed on Aug. 29, 2019.The disclosures of the aforementioned applications are herebyincorporated by reference in their entireties.

TECHNICAL FIELD

This application relates to the field of computer technologies, and inparticular, to a video super-resolution processing method and apparatus.

BACKGROUND

Usually, definition of a video (or an image) is determined by resolutionof an image. Low-resolution (low-resolution, LR) means that definitionof the image is comparatively low, and high-resolution (high-resolution,HR) means that definition of the image is comparatively high. To obtaina high-resolution video source, a most direct method is to use ahigh-resolution image sensor. However, the high-resolution video sourceis extremely scarce in daily life, and storage and transmission of thehigh-resolution video source will impose a heavy burden on a currenttransmission network. Therefore, a super-resolution (super-resolution,SR) technology is proposed to restore a high-resolution video on a basisof an existing low-resolution video source. The super-resolutiontechnology improves resolution of an original image by using hardware orsoftware. A process of obtaining a high-resolution image from a seriesof low-resolution images is super-resolution reconstruction.

Currently, there are many existing image super-resolution algorithms,and a video can also be decomposed into single-frame images forsuper-resolution processing. To be specific, an entire image is capturedfrom a video stream, super-resolution processing is performed on theentire captured image by using a convolutional neural network, andfinally an image obtained after super-resolution processing isretransmitted to the video stream. An effect of image super-resolutionprocessing is positively correlated with a depth and a scale of aconvolutional neural network model, but the depth and the scale of theconvolutional neural network model are limited by a processingcapability and power consumption of a processor (a central processingunit (central processing unit, CPU), a graphics processing unit(graphics processing unit, GPU), and/or an embedded neural-networkprocessing unit (neural-network processing unit, NPU)). Therefore, whileensuring the effect of image super-resolution processing, the processorneeds to perform a large quantity of convolution operations to run theconvolutional neural network that is for super-resolution processing,and the power consumption also increases with an increase in acalculation amount of the processor. In addition, super-resolutionprocessing of the video can be decomposed into super-resolutionprocessing of each frame of image in the video, and there is acomparatively large amount of pixel data in each frame of image in thevideo. Therefore, when the processor performs super-resolutionprocessing on each frame of image, processing time is comparatively longbecause of a large amount of data, and super-resolution processingdelays of any two frames of images in the video also increase with anincrease in a calculation amount of the processor.

SUMMARY

Embodiments of this application provide a video super-resolutionprocessing method and apparatus, which can reduce power consumptionwhile ensuring an effect of super-resolution processing performed on asingle frame of image in a video, and shorten super-resolutionprocessing delays of any two frames of images in the video.

The following describes this application from different aspects. Itshould be understood that reference may be made to each other forimplementations and beneficial effects of the following differentaspects.

According to a first aspect, an embodiment of this application providesa video super-resolution processing method, where the method includes:First, a terminal device may obtain encoded information of any codingblock in a video stream, where the encoded information may include areference image frame index, a matched coding block index, and aninter-frame prediction marker. Second, the terminal device may determinean inter-frame prediction mode of the any coding block based on theinter-frame prediction marker, and may determine a super-resolutionpixel block of the any coding block based on the inter-frame predictionmode of the any coding block and pixel information of a matched codingblock identified by the matched coding block index. Finally, theterminal device may stitch super-resolution pixel blocks of all codingblocks that belong to a same image frame in the video stream to obtain asuper-resolution image, and may combine super-resolution images of allimage frames in the video stream into a super-resolution video foroutput. A coding block belongs to an image frame in the video stream,and one image frame in the video stream includes a plurality of codingblocks. The matched coding block identified by the matched coding blockindex is a coding block that is in a reference image frame identified bythe reference image frame index and that has a minimum pixel differencefrom the any coding block. The super-resolution pixel block of the anycoding block is a pixel block obtained after super-resolution processingis performed on pixel information of the any coding block.

Compared with direct super-resolution processing performed on an entireimage frame, in this embodiment of this application, the terminal deviceperforms different super-resolution processing on coding blocks in a Pframe based on different inter-frame prediction modes of the codingblocks, to obtain super-resolution pixel blocks, and then stitches thesuper-resolution pixel blocks to obtain a super-resolution image. Thisnot only implements super-resolution processing of an entire imageframe, but also can reduce a calculation amount while ensuring an effectof super-resolution processing performed on a single frame of image in avideo, thereby reducing power consumption and shorteningsuper-resolution processing delays of any two frames of images in thevideo.

With reference to the first aspect, in a possible implementation, thatthe terminal device determines a super-resolution pixel block of the anycoding block based on the inter-frame prediction mode of the any codingblock and pixel information of a matched coding block identified by thematched coding block index may be specifically: When the inter-frameprediction mode is a first-type prediction mode (that is, a skip skipmode), it indicates that the pixel information of the any coding blockis the same as the pixel information of the matched coding block, and italso indicates that there is no pixel prediction residual between acoding block in the skip mode and a matched coding block of the codingblock. In this case, the terminal device may directly use a matchedsuper-resolution pixel block of the matched coding block as thesuper-resolution pixel block of the any coding block. The matchedsuper-resolution pixel block may be a pixel block obtained aftersuper-resolution processing is performed on the pixel information of thematched coding block. A time sequence of the reference image frame isprior to a time sequence of an image frame to which the any coding blockbelongs. Therefore, the terminal device has obtained a super-resolutionimage of the reference image frame when processing the any coding block,and thus can directly obtain the matched super-resolution pixel block ofthe matched coding block in the reference image frame. When theinter-frame prediction mode of the any coding block is the skip mode,the terminal device in this embodiment of this application directly usesthe matched super-resolution pixel block that is obtained as thesuper-resolution pixel block of the any coding block, and does not needto perform super-resolution processing on the pixel information of theany coding block. Therefore, power consumption generated duringsuper-resolution processing can be reduced, that is, power consumptionof the terminal device is reduced. In addition, time forsuper-resolution processing of a single frame of image can be shortened,and thus super-resolution processing delays of any two frames of imagescan be shortened.

With reference to the first aspect, in a possible implementation, thatthe terminal device determines a super-resolution pixel block of the anycoding block based on the inter-frame prediction mode of the any codingblock and pixel information of a matched coding block identified by thematched coding block index may further be: When the inter-frameprediction mode is a second-type prediction mode (that is, a merge mergemode or an inter-frame AMVP mode), it indicates that the pixelinformation of the any coding block is not completely the same as thepixel information of the matched coding block, and it also indicatesthat there is a pixel prediction residual between a coding block in theskip mode and a matched coding block of the coding block. In this case,the terminal device may obtain a pixel prediction residual in theencoded information of the any coding block, may determine the pixelinformation of the any code based on the pixel information of thematched coding block and the pixel prediction residual, and then mayinput the pixel information of the any coding block into asuper-resolution model for super-resolution processing, to obtain thesuper-resolution pixel block of the any coding block. When theinter-frame prediction mode of the any coding block is the merge mode orthe AMVP mode, the terminal device in this embodiment of thisapplication inputs the pixel information of the any coding block intothe super-resolution model for super-resolution processing. This canensure a super-resolution effect of a super-resolution pixel block,thereby ensuring a super-resolution effect of an entire image frame.

With reference to the first aspect, in a possible implementation, thepixel information may be a pixel matrix, and the pixel predictionresidual may be a pixel residual matrix. When determining the pixelinformation of the any code based on the pixel information of thematched coding block and the pixel prediction residual, the terminaldevice may use a sum of a pixel matrix of the matched coding block andthe pixel residual matrix as a pixel matrix of the any coding block.

With reference to the first aspect, in a possible implementation, wheninputting the pixel information of the any coding block into thesuper-resolution model for super-resolution processing, to obtain thesuper-resolution pixel block of the any coding block, the terminaldevice may obtain a first pixel region in the any coding block (that is,an edge region in the any coding block), may perform pixel padding onthe first pixel region (that is, perform edge compensation on the edgeregion), and then may input pixel information of the post-pixel-paddingfirst pixel region into the super-resolution model for super-resolutionprocessing, to obtain a first pixel block corresponding to the anycoding block (that is, perform super-resolution processing on the edgeregion on which edge compensation has been performed). Further, theterminal device may perform super-resolution processing on the pixelinformation of the any coding block, and may stitch the first pixelblock and a pixel block obtained after super-resolution processing isperformed on the pixel information of the any coding block, to obtainthe super-resolution pixel block of the any coding block (that is,stitch a result obtained after super-resolution processing is performedon the edge region and a result obtained after super-resolutionprocessing is performed on the coding block, to obtain a completesuper-resolution pixel block). The pixel information of the first pixelregion may be pixel information in the pixel information of the anycoding block except pixel information of a second pixel region, and thesecond pixel region is a pixel region (that is, a central region) of apreset size in the any coding block. The terminal device in thisembodiment of this application extracts the edge region of the codingblock, then performs super-resolution processing on thepost-pixel-padding edge region to obtain the first pixel block, andfinally stitches the first pixel block and the pixel block obtainedafter super-resolution processing is performed on the pixel informationof the coding block, to obtain the super-resolution pixel block of thecoding block. This can significantly ease a problem of a boundarybetween coding blocks, and can further ensure a super-resolutionprocessing effect of a single frame of image.

With reference to the first aspect, in a possible implementation, whenobtaining the first pixel region of the any coding block (that is, theedge region of the any coding block), the terminal device may determinethe second pixel region of the preset size in the any coding block (thatis, the central region in the any coding block), and determine, as thefirst pixel region (that is, the edge region of the any coding block), aregion that is in the any coding block and that does not overlap withthe second pixel region. The preset size may be determined based on aconvolutional layer quantity and a convolution kernel size that are ofthe super-resolution model.

With reference to the first aspect, in a possible implementation, theencoded information further includes a motion vector predictor MVPand/or a motion vector difference MVD. That the terminal device stitchessuper-resolution pixel blocks of all coding blocks that belong to a sameimage frame in the video stream to obtain a super-resolution image maybe specifically: The terminal device may determine a motion vector MV ofthe any coding block based on the inter-frame prediction mode, and theMVP and/or the MVD, may determine, based on the MV and a location of thematched coding block in the reference image frame, a location of the anycoding block in an image frame to which the any coding block belongs,and finally, may stitch the super-resolution pixel blocks of all thecoding blocks that belong to the same image frame in the video streambased on locations of all the coding blocks in the image frame to whichall the coding blocks belong, to obtain the super-resolution image ofthe image frame to which all the coding blocks belong. The terminaldevice in this embodiment of this application determines the MV based ondifferent inter-frame prediction modes, and determines, by using the MV,the location of the any coding block in the image frame to which the anycoding block belongs, thereby ensuring that a super-resolution pixelblock is in a correct location during stitching, and that a completesuper-resolution image is obtained.

With reference to the first aspect, in a possible implementation, theterminal device determines the motion vector MV of the any coding blockbased on the inter-frame prediction mode, and the MVP and/or the MVD.When the inter-frame prediction mode of the any coding block is thefirst-type prediction mode (the skip mode), the terminal device uses theMVP in the encoded information as the motion vector MV of the any codingblock; and when the inter-frame prediction mode of the any coding blockis the second-type prediction mode (the merge mode or the AMVP mode),the terminal device uses a sum of the MVP and the MVD that are in theencoded information as the MV of the any coding block, where the MVD is0 in the merge mode.

With reference to the first aspect, in a possible implementation, aframe type of the image frame to which the any coding block belongs is aP frame.

With reference to the first aspect, in a possible implementation, thevideo stream further includes a first image frame whose frame type is anI frame. The method may further include: When a frame type of an imageframe to which a coding block in the video stream belongs is the Iframe, the terminal device may obtain pixel information of the firstimage frame, and then may input the pixel information of the first imageframe into the super-resolution model for super-resolution processing,to obtain a super-resolution image of the first image frame, where thefirst image frame herein indicates the I frame.

The terminal device may further combine the super-resolution images ofall the image frames in the video stream into the super-resolutionvideo.

The terminal device in this embodiment of this application usesdifferent super-resolution processing manners for image frames ofdifferent frame types in a video stream. For a P frame, the terminaldevice performs super-resolution processing on each coding block in theP frame. For an I frame, the terminal device directly performssuper-resolution processing on a complete I-frame image. This improvesthe video super-resolution processing method, and ensures asuper-resolution processing effect of each image frame in the videostream.

According to a second aspect, an embodiment of this application providesa video super-resolution processing apparatus, where the videosuper-resolution processing apparatus includes units and/or modulesconfigured to perform the video super-resolution processing methodprovided in any one of the first aspect and/or the possibleimplementations of the first aspect, and therefore can also implementbeneficial effects (or advantages) of the video super-resolutionprocessing method provided in the first aspect.

According to a third aspect, an embodiment of this application providesa terminal device, including a processor and a memory, where the memoryis configured to store a computer program, the computer program includesprogram instructions, and when the processor runs the programinstructions, the terminal device performs the video super-resolutionprocessing method provided in the first aspect. The terminal device mayfurther include a receiver, where the receiver is configured to receivea video stream transmitted on a network.

According to a fourth aspect, an embodiment of this application providesa computer program product, where the computer program product includescomputer program code, and when the computer program code is run on acomputer, the computer is enabled to perform the video super-resolutionprocessing method provided in the first aspect.

According to a fifth aspect, an embodiment of this application providesa chip, including a processor. The processor is configured to read andexecute a computer program stored in a memory, to perform the videosuper-resolution processing method provided in any possibleimplementation of the first aspect. Optionally, the chip furtherincludes the memory, and the memory is connected to the processor byusing a circuit or a wire. Further optionally, the chip further includesa communications interface, and the processor is connected to thecommunications interface. The communications interface is configured toreceive data and/or information that needs to be processed. Theprocessor obtains the data and/or information from the communicationsinterface, processes the data and/or information, and outputs aprocessing result through the communications interface. Thecommunications interface may be an input/output interface.

Optionally, the processor and the memory may be physically independentunits, or the memory may be integrated with the processor.

According to a sixth aspect, an embodiment of this application providesa computer-readable storage medium. The computer-readable storage mediumstores computer program instructions. When the computer programinstructions are run on a computer, the computer is enabled to performthe video super-resolution processing method provided in the firstaspect.

The embodiments of this application are implemented, which can reducepower consumption while ensuring a super-resolution processing effect ofa single-frame image in a video, and can also shorten super-resolutionprocessing delays of any two frames of images in the video.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic diagram of a system architecture according to anembodiment of this application;

FIG. 2 is a schematic flowchart of a video super-resolution processingmethod according to an embodiment of this application;

FIG. 3 is a schematic diagram of locations of header files of imageframes according to an embodiment of this application;

FIG. 4 is a schematic diagram of valid padding and same paddingaccording to an embodiment of this application;

FIG. 5 is a schematic flowchart of determining a super-resolution pixelblock according to an embodiment of this application;

FIG. 6 is a schematic diagram of pixel padding in a first pixel regionaccording to an embodiment of this application;

FIG. 7a -1 and FIG. 7a -2 are a schematic diagram of determining asuper-resolution pixel block according to an embodiment of thisapplication;

FIG. 7b is a schematic diagram of locations of coding blocks accordingto an embodiment of this application;

FIG. 8 is a schematic diagram of coding block stitching according to anembodiment of this application;

FIG. 9 is a schematic diagram of internal implementation ofsuper-resolution processing performed by a terminal device on a P frameaccording to an embodiment of this application;

FIG. 10 is a schematic structural diagram of a video super-resolutionprocessing apparatus according to an embodiment of this application; and

FIG. 11 is a schematic structural diagram of a terminal device accordingto an embodiment of this application.

DESCRIPTION OF EMBODIMENTS

The following clearly and completely describes the technical solutionsin the embodiments of this application with reference to theaccompanying drawings in the embodiments of this application.

The video super-resolution processing method provided in the embodimentsof this application is applicable to video processing applicationscenarios such as a video call, a video conference, video playing(including a video on demand, a video on live, playing of film andtelevision works or short videos, and the like), video surveillance, andvideo recording.

In some feasible implementations, a system architecture in the foregoingapplication scenarios may usually include a video transmit end and avideo receive end. FIG. 1 is a schematic diagram of a systemarchitecture according to an embodiment of this application. A videotransmit end may include a video input module and a video encodingmodule; the video input module may be a camera; and the video encodingmodule may be a video encoder. A video receive end may include a videodecoding module, a decoded-frame extraction module, a super-resolutionmodule, and a video output module; the video decoding module may be avideo decoder; the super-resolution module may include asuper-resolution model; and the video output module may be a display.The video transmit end may input a video collected or stored by thevideo input module into the video encoding module for video compressioncoding. The video is transmitted from the video transmit end to thevideo receive end through network transmission. The video receive endmay input the video received from a network into the video decodingmodule for video decoding, extract a decoded frame of the video by usingthe decoded-frame extraction module, perform super-resolution processingon the decoded frame of the video by using the super-resolution module,and finally output, by using the video output module, a video obtainedafter super-resolution processing. Optionally, network transmission mayinclude wired network transmission and wireless network transmission. Amedium for wired network transmission may be a coaxial cable, a networkcable (twisted pair), or an optical fiber. A carrier for wirelessnetwork transmission may be a radio wave, and a wireless network mayinclude a wireless local area network, a wireless metropolitan areanetwork, a wireless personal network, and the like.

In a video call, a video conference, or a video on live scenario, thevideo transmit end may be a terminal that has shooting and videoencoding functions, such as a mobile phone, a notebook computer, atablet computer, a desktop computer, or a conference terminal; and thevideo receive end may be a terminal that has video decoding and displayfunctions, such as a mobile phone, a notebook computer, a tabletcomputer, a desktop computer, or a conference terminal. In a video ondemand scenario, or a scenario of playing film and television works or ashort video, the video transmit end may be a cloud server, and the videoinput module may be storage space of the cloud server, where the storagespace of the cloud server may store various film and television works,short videos, video on demand resources (such as audio and videoprograms), and the like; and the video receive end may be a terminalthat has video decoding and display functions, such as a mobile phone, anotebook computer, a desktop computer, a tablet computer, or a smart TV.In a video surveillance scenario, the video transmit end may be a devicethat has surveillance camera shooting and video encoding functions, suchas a network camera (the network camera is an advanced camera devicethat integrates shooting, video encoding, and world wide web services);and the video receive end may be a terminal that has video decoding anddisplay functions, such as a mobile phone, a notebook computer, a tabletcomputer, or a desktop computer.

In some other feasible implementations, the system architecture providedin this embodiment of this application may include only a video receiveend. In this case, the video receive end may include a video receivingmodule, a video decoding module, a decoded-frame extraction module, asuper-resolution module, and a video output module. The video receiveend may include a user interface or a camera. The video decoding modulemay be a video decoder, the super-resolution module may include asuper-resolution model, and the video output module may be a display.The video receive end may receive a video transmitted on a network orcollect a video recorded by a camera, input the video into the videodecoding module for video decoding, extract a decoded frame of the videoby using the decoded-frame extraction module, perform super-resolutionprocessing on the decoded frame of the video by using thesuper-resolution module, and finally output, by using the video outputmodule, a video obtained after super-resolution processing. The userinterface may be configured to receive a video transmitted on thenetwork. For example, in a video recording scenario, the systemarchitecture may include only a video receive end. In this case, thevideo receive end may be a device that has a video shooting function,such as a mobile phone. In some other feasible implementations, a videoreceive end (or referred to as a terminal) may read a locally storedvideo, perform super-resolution processing, and then display locally avideo obtained after super-resolution processing, or send a videoobtained after super-resolution processing to another device fordisplay.

For ease of understanding, some terms (nouns) related to the methodprovided in the embodiments of this application are briefly described inthe following.

1. Video Compression Coding

From a viewpoint of information theory, data describing a signal sourceis a sum of information and data redundancy, that is,data=information+data redundancy. There are many types of dataredundancy, such as spatial redundancy, temporal redundancy, visualredundancy, and statistical redundancy. When an image serves as a signalsource, the essence of video compression coding is to reduce the dataredundancy in the image.

Commonly used video compression coding standards include high efficiencyvideo coding (high efficiency video coding, HEVC, also referred to asH.265). A main idea of the HEVC is to search a single frame of image ora plurality of frames of images for pixel blocks (or macroblocks) withredundant information. In a video compression process, these pixelblocks with redundant information are described by some information(such as pixel prediction residuals and motion vector differences)rather than original pixel values, thereby achieving high efficiencyvideo compression. An HEVC process may include inter-frame predictionand motion estimation.

2. Inter-Frame Prediction

Inter-frame prediction may indicate that a pixel of a current image ispredicted by using a correlation between video image frames, that is, atime domain correlation, and by using a pixel of an adjacent encodedimage (that is, a reference image frame), to achieve an objective ofremoving time domain redundancy from a video (or implementing imagecompression). A difference between the pixel of the current image andthe pixel of the adjacent encoded image is a pixel prediction residual.

3. Motion Estimation

Each frame of image in a video is divided into a plurality ofnon-overlapping macroblocks (pixel blocks), and it is considered thatall pixels in each macroblock have same displacement (that is, locationcoordinates of the pixels in the image are the same). For any macroblocki of a plurality of macroblocks in one frame of image, a macroblock thathas a minimum pixel difference from the macroblock i, that is, a matchedblock, is searched for within a specific search range of a referenceimage frame (that is, a reference frame) according to a specificmatching rule. Relative displacement between the matched block and themacroblock i may be a motion vector (motion vector, MV), and a processof obtaining the motion vector may be referred to as motion estimation.The minimum pixel difference may indicate a minimum rate distortion costin high efficiency video coding. A commonly used matching rule mayinclude a minimum mean square error, a minimum average absolute error, amaximum quantity of matched pixels, and the like. A search algorithm forthe high efficiency video coding H.265 may include a full searchalgorithm and a TZSearch algorithm. The macroblock may be referred to asa coding block at a video receive end, and the matched block may bereferred to as a matched coding block at the video receive end.

4. Motion Vector Difference

A difference between an MV of a current block (for example, themacroblock i) and a motion vector of a candidate block (that is, amotion vector predictor (motion vector prediction, MVP)) is a motionvector difference (motion vector difference, MVD), and the candidateblock may be a macroblock adjacent to an image frame to which thecurrent block belongs.

In some feasible implementations, because a video may be a series ofconsecutive images, super-resolution processing of the video may bedecomposed into super-resolution processing of a single frame of image,and an image obtained after super-resolution processing is an imagewhose magnification is an integer multiple relative to an originalimage. Image super-resolution processing performed by a mobile terminalis used as an example for description. Specifically, the mobile terminalmay divide an image (or a picture) into a plurality of image blocks;perform mathematical interpolation stretching on an image block withless texture information in the plurality of image blocks, to obtain anenlarged image block; input an image block with more texture informationin the plurality of image blocks into a super-resolution model (aconvolutional neural network) for super-resolution processing; andfinally stitch an image block obtained after mathematical interpolationstretching and an image block output by the super-resolution model, toobtain a super-resolution image. During image super-resolutionprocessing performed by the mobile terminal, an image is also stretchedand magnified through mathematical interpolation, but definition of animage block obtained after mathematical interpolation processing islower than definition of an image block obtained after super-resolutionprocessing performed by using the super-resolution model. Therefore,when two adjacent image blocks in an image are processed differently (tobe specific, mathematical interpolation stretching is performed on oneimage block of the adjacent image blocks, and super-resolutionprocessing is performed on the other image block of the adjacent imageblocks by using the super-resolution model), definition of twopost-processing adjacent image blocks is different. Therefore, anobvious boundary problem occurs after the two post-processing adjacentimage blocks are stitched. Optionally, the super-resolution model may beany neural network model of a super-resolution convolutional neuralnetwork (super-resolution convolutional neural network, SRCNN), a fastsuper-resolution convolutional neural network (fast super-resolutionconvolutional neural network, FSRCNN), accurate image super-resolutionusing a very deep convolutional network (accurate image super-resolutionusing very deep convolutional networks, VDSR), a cascading residualnetwork (cascading residual network, CARN), or multi-objectivereinforced evolution in mobile neural architecture search(multi-objective reinforced evolution in mobile neural architecturesearch, More MNAS-A).

To address a problem that an effect of image super-resolution processing(including a definition problem and a boundary problem) and powerconsumption cannot be balanced in image super-resolution processingperformed by the mobile terminal, this application provides a videosuper-resolution processing method. The method can reduce powerconsumption while ensuring an effect of super-resolution processingperformed on a single frame of image in a video, resolve a problem of aboundary between adjacent image blocks in a single frame of image, andfurther shorten super-resolution processing delays of any two frames ofimages in the video. Therefore, a stalling problem that occurs when alarger-scale super-resolution model performs video super-resolutionprocessing can be resolved.

In some feasible implementations, the video super-resolution processingmethod provided in this application may be applied to a video receiveend, for example, used in a super-resolution module in FIG. 1. The videoreceive end may be a terminal device such as a mobile phone, a notebookcomputer, a tablet computer, a smart TV, an augmented reality (augmentedreality, AR) device/a virtual reality (virtual reality, VR) device, oran autonomous driving device, or another type of device. For ease ofdescription, the video super-resolution processing method provided inthis application is described in the following by using a terminaldevice as an example.

FIG. 2 is a schematic flowchart of a video super-resolution processingmethod according to an embodiment of this application. As shown in FIG.2, the video super-resolution processing method provided in thisembodiment of this application may include the following steps.

S201: Receive a video stream transmitted from a network, and performvideo decoding on the received video stream, to obtain a coding blockincluded in the video stream.

In some feasible implementations, the video stream provided in thisembodiment of this application may be a real-time video stream, forexample, a video stream during a video call or a video stream duringlive broadcast, or may be a video stream stored in a cloud server, forexample, a video stream of a movie or a TV series. A type of the videostream is not limited in this embodiment of this application.

In some feasible implementations, a terminal device may receive, byusing various video applications (application, APP), the video streamtransmitted on the network, and may perform video decoding on thereceived video stream, to obtain the coding block included in the videostream. The video stream may include a plurality of coding blocks, acoding block may belong to an image frame in the video stream, one imageframe may include a plurality of coding blocks, and a plurality of imageframes may belong to the same video stream. Optionally, an imageobtained after decoding of the video stream is completed may be bufferedin a decoded picture buffer (decoded picture buffer, DPB), to be used asa reference image of a subsequent image frame.

S202: Obtain a frame type of an image frame to which any coding block inthe video stream belongs.

In some feasible implementations, each image frame in the video streamhas a header file, and the header file of the image frame may includeinformation such as a frame type identifier and an image frame index(the image frame index may be used to identify a frame that is in thevideo stream and to which an image frame belongs). FIG. 3 is a schematicdiagram of locations of header files of image frames according to anembodiment of this application. As shown in FIG. 3, m coding blocks ofcoding blocks 11, 12, . . . , and 1 m in a video stream all belong to animage frame 1, k coding blocks of coding blocks 21, 22, . . . , and 2 kall belong to an image frame 2, and n image frames of image frames 1, 2,. . . , and n belong to the video stream, where m and k may be the sameor different, and n, m, and k each are greater than 1. When the videostream is transmitted on the network, the header file of each imageframe is transmitted in a form of a hexadecimal code segment. Therefore,the terminal device may obtain a header file of the image frame to whichthe any coding block in the video stream belongs, and may determine,based on a frame type identifier in the header file, the frame type ofthe image frame to which the any coding block belongs. For example, whenthe frame type identifier is “5”, it is determined that the frame typeof the image frame to which the any coding block belongs is an I frame;or when the frame type identifier is “1”, it is determined that theframe type of the image frame to which the any coding block belongs is aP frame. If the frame type of the image frame to which the any codingblock belongs is the P frame, the terminal device performs step S203 tostep S205 after step S202. If the frame type of the image frame to whichthe any coding block belongs is the I frame, the terminal deviceperforms step S206 and step S207 after step S202.

S203: If the frame type of the image frame to which the any coding blockin the video stream belongs is the P frame, obtain encoded informationof the any coding block in the video stream.

In some feasible implementations, the encoded information may includeinformation such as a reference image frame index (that is, a referenceframe index), an inter-frame prediction marker, a matched coding blockindex, and/or a coding block size. The inter-frame prediction marker maybe used to identify an inter-frame prediction mode used by the anycoding block. The matched coding block index may be used to identify amatched coding block. The matched coding block may be a coding blockthat is in another image frame (an image frame other than the imageframe to which the any coding block belongs) and that has a minimumpixel difference (a minimum rate distortion cost) from the any codingblock. The reference image frame index may be used to identify an imageframe to which the matched coding block belongs. The coding block sizemay be a size of a coding block, for example, 8×8 pixels, 16×16 pixels,or 32×32 pixels. The inter-frame prediction mode provided in thisembodiment of this application may include an inter-frame mode (AMVPmode), a skip mode (skip mode), and a merge mode (merge mode). Theinter-frame prediction mode used by the any coding block may be any oneof the AMVP mode, the skip mode, or the merge mode.

S204: Determine, based on the inter-frame prediction marker included inthe encoded information, the inter-frame prediction mode of the anycoding block, and determine a super-resolution pixel block of the anycoding block based on the inter-frame prediction mode of the any codingblock and pixel information of the matched coding block identified bythe matched coding block index included in the encoded information.

In some feasible implementations, the encoded information may includethe reference image frame index, the matched coding block index, theinter-frame prediction marker, and the like. The super-resolution pixelblock may include a pixel block obtained after super-resolutionprocessing is performed on pixel information of the any coding block bya super-resolution model. The super-resolution model may be aconvolutional neural network model. In a process of convolution featureextraction performed by using the convolutional neural network model,because a dimension of a feature map output by a convolutional layer ofa convolutional neural network after a convolution operation isperformed is less than a dimension of an input image (for example, thedimension of the input image is 3×3, a size of a convolution kernel is3×3, and the dimension of the feature map output by the convolutionlayer after the convolution operation is performed is 1×1), edge pixelpadding is required during convolution feature extraction, so that asize (or dimension) of the feature map output by the convolution layeris consistent with a size (or dimension) of the input image. There aretwo commonly used edge pixel padding manners: valid padding (validpadding) and same padding (same padding). FIG. 4 is a schematic diagramof valid padding and same padding according to an embodiment of thisapplication. As shown in FIG. 4, for the valid padding (valid padding),pixel value padding is not performed on an input image, to be specific,a convolution feature of the input image is directly extracted duringconvolution feature extraction; and if a quantity of remaining pixels isless than a size of a convolution kernel, the remaining pixels aredirectly discarded. For the same padding (same padding), pixels whosepixel values are 0 are padded around the input image, to be specific, aconvolution feature of an input image obtained after padding withzero-value pixels is performed is extracted during convolution featureextraction; and if the quantity of the remaining pixels is less than thesize of the convolution kernel, pixels whose pixel values are 0 arepadded so that a quantity of pixels obtained after padding is the sameas the size of the convolution kernel.

In some feasible implementations, referring to FIG. 5, FIG. 5 is aschematic flowchart of determining a super-resolution pixel blockaccording to an embodiment of this application. In step S204, thedetermining a super-resolution pixel block of the any coding block mayinclude the following steps.

S2041: Determine the inter-frame prediction mode of the any coding blockbased on the inter-frame prediction marker included in the encodedinformation.

In some feasible implementations, the terminal device may determine theinter-frame prediction mode identified by the inter-frame predictionmarker in the encoded information as the inter-frame prediction mode ofthe any coding block. For example, when the inter-frame predictionmarker is “2”, it is determined that the inter-frame prediction mode ofthe any coding block is the AMVP mode; when the inter-frame predictionmarker is “1”, it is determined that the inter-frame prediction mode ofthe any coding block is the merge mode; and when the inter-frameprediction marker is “0”, it is determined that the inter-frameprediction mode of the any coding block is the skip mode.

S2042: When the inter-frame prediction mode of the any coding block is afirst-type prediction mode, determine a matched super-resolution pixelblock of the matched coding block as the super-resolution pixel block ofthe any coding block.

In some feasible implementations, there is no pixel prediction residualin the skip mode of high efficiency video coding, but there is a pixelprediction residual in the merge mode and the AMVP mode of the highefficiency video coding. Therefore, in this embodiment of thisapplication, the skip mode in which there is no pixel predictionresidual may be used as the first-type prediction mode, and the mergemode and the AMVP mode in which there is the pixel prediction residualmay be used as a second-type prediction mode. When the inter-frameprediction mode of the any coding block is the first-type predictionmode, the terminal device may obtain the matched super-resolution pixelblock corresponding to the matched coding block in a reference imageframe identified by the reference image frame index in the encodedinformation. In the skip mode, the pixel information of the any codingblock is the same as the pixel information of the matched coding block.Therefore, the terminal device may determine the matchedsuper-resolution pixel block of the matched coding block as thesuper-resolution pixel block of the any coding block. The matchedsuper-resolution pixel block may include a pixel block obtained aftersuper-resolution processing is performed on the pixel information of thematched coding block by using the super-resolution model. A coding blocksize of the any coding block may be the same as a coding block size ofthe matched coding block. For example, the coding block size of the anycoding block is 8×8 pixels, and the coding block size of the matchedcoding block is also 8×8 pixels. A time sequence of the reference imageframe in the video stream is prior to a time sequence of the image frameto which the any coding block belongs, that is, the reference imageframe is an image frame before the image frame to which the any codingblock belongs. Therefore, before the super-resolution pixel blockcorresponding to the matched coding block is obtained, super-resolutionpixel blocks are obtained for all coding blocks in the reference imageframe, and thus the matched super-resolution pixel block of the matchedcoding block previously obtained may be directly obtained. In addition,because the terminal device directly uses the matched super-resolutionpixel block obtained as the super-resolution pixel block of the anycoding block, super-resolution processing does not need to be performedon the pixel information of the any coding block. Therefore, powerconsumption generated during super-resolution processing can be reduced,that is, power consumption of the terminal device is reduced. Inaddition, time for super-resolution processing of a single frame ofimage can be shortened, and thus super-resolution processing delays ofany two frames of images can be shortened.

S2043: When the inter-frame prediction mode of the any coding block isthe second-type prediction mode, determine a pixel prediction residualof the any coding block, and determine the pixel information of the anycoding block based on the pixel information of the matched coding blockand the pixel prediction residual.

In some feasible implementations, because there is the pixel predictionresidual in the second-type prediction mode (for example, the merge modeand the AMVP mode), when the inter-frame prediction mode of the anycoding block is the second-type prediction mode, the terminal device mayobtain the pixel prediction residual of the any coding block, and mayobtain the pixel information of the matched coding block in thereference image frame. Because the reference image frame is the imageframe before the image frame to which the any coding block belongs,before the any coding block is processed, pixel information of all thecoding blocks in the reference image frame is determined. Therefore, thepixel information of the matched coding block may be directly obtained.The terminal device may determine the pixel information of the anycoding block based on the pixel prediction residual and the pixelinformation of the matched coding block. For example, a result obtainedafter the pixel prediction residual is superimposed on the pixelinformation of the matched coding block is used as the pixel informationof the any coding block. The coding block size of the any coding blockmay be the same as the coding block size of the matched coding block. Ifboth the pixel prediction residual and the pixel information may berepresented by a set, for example, the pixel prediction residual isrepresented by a set pixel-s, and the pixel information is representedby a set pixel-m, a size of the set pixel-m may be the same as a size ofthe set pixel-s. That is, a quantity of elements in the set pixel-m maybe equal to a quantity of elements in the set pixel-s; and superimposingthe pixel prediction residual on the pixel information may be adding theelements in the one set and the corresponding elements in the other set.

In some feasible implementations, the pixel prediction residual may be apixel residual matrix, and the pixel information may be a pixel matrix.The terminal device may determine a sum of the pixel matrix of thematched coding block and the pixel residual matrix that are obtained asa pixel matrix of the any coding block. A size of the pixel matrix maybe the same as a size of the pixel residual matrix, and an element inthe pixel matrix may be a pixel value of a pixel.

S2044: Perform super-resolution processing on the pixel information ofthe any coding block, to obtain the super-resolution pixel block of theany coding block.

In some feasible implementations, after determining the pixelinformation of the any coding block, the terminal device may input thepixel information of the any coding block into the super-resolutionmodel for super-resolution processing, and may obtain a pixel blockoutput by the super-resolution model for the pixel information of theany coding block. The terminal device may determine the pixel blockoutput by the super-resolution model as the super-resolution pixel blockof the any coding block. The pixel information may be a pixel matrix. Inthis case, the edge pixel padding manner used by the super-resolutionmodel may be the same padding (same padding), that is, a “padding”parameter of a super-resolution mode is “same”. The terminal deviceperforms super-resolution processing on pixel information of a codingblock by using the super-resolution model, a super-resolution effect ofthe coding block can be ensured, and the super-resolution model may be asmall-scale model (for example, a super-resolution model whose modelsize is less than 100 KB). Therefore, a stalling problem that occurswhen a larger-scale super-resolution model performs videosuper-resolution processing can be resolved.

In some feasible implementations, the terminal device may determine apreset size, then may determine, in the any coding block, a pixel region(a central region) whose size is the preset size, and may use thedetermined pixel region as a second pixel region. The terminal devicemay use, as a first pixel region (an edge region), a region that is inthe any coding block and that does not overlap with the second pixelregion. In this case, pixel information of the first pixel region may bepixel information in the pixel information of the any coding blockexcept pixel information of the second pixel region (that is, thecentral region). The terminal device may perform pixel padding on thefirst pixel region, and may input the pixel information of thepost-pixel-padding first pixel region into the super-resolution modelfor super-resolution processing, to obtain a first pixel blockcorresponding to the any coding block. The terminal device may performpixel stitching on the first pixel block and a pixel block obtainedafter the super-resolution model performs super-resolution processing onthe pixel information of the any coding block, to obtain thesuper-resolution pixel block of the any coding block. The terminaldevice extracts the edge region of the coding block, then performssuper-resolution processing on the post-pixel-padding edge region toobtain the first pixel block, and finally stitches the first pixel blockand the pixel block obtained after super-resolution processing isperformed on the pixel information of the coding block, to obtain thesuper-resolution pixel block of the coding block. This can significantlyease a problem of a boundary between coding blocks, and thus can ensurea super-resolution processing effect of a single frame of image. Asuper-resolution processing manner of the pixel information of thepost-pixel-padding first pixel region is the same as that of the pixelinformation of the any coding block, that is, the same super-resolutionmodel is used; and in this case, an edge pixel padding manner used bythe super-resolution model may be the valid padding (valid padding),that is, a “padding” parameter of a super-resolution mode is “valid”.The preset size may be determined based on a convolutional layerquantity and a convolution kernel size that are of the super-resolutionmodel.

For example, FIG. 6 is a schematic diagram of pixel padding in a firstpixel region according to an embodiment of this application. In FIG. 6,it is assumed that the determined preset size is 3×3, the any codingblock is a coding block A, and a coding block size of the coding block Ais 5×5 pixels. The terminal device extracts an edge region of the codingblock A. As shown in FIG. 6, the extracted edge region is a circle ofpixels on the outermost periphery of the coding block A. The terminaldevice performs pixel padding on the edge region of the coding block Ato obtain a padded block shown in FIG. 6.

In some feasible implementations, step S2043 and step S2044 in thisembodiment of this application may be performed before step S2042, stepS2043 and step S2044 may be performed after step S2042, or step S2043and step S2044 may be alternatively performed simultaneously with stepS2042. An execution order of step S2043 and step S2044, and step S2042is not limited in this embodiment of this application.

S205: Stitch super-resolution pixel blocks of all coding blocks thatbelong to a same image frame in the video stream to obtain asuper-resolution image.

In some feasible implementations, the terminal device may obtain,according to the implementation of step S203 and step S204, thesuper-resolution pixel blocks of all the coding blocks that belong tothe same image frame (a frame type of the image frame is the P frame) inthe video stream. For example, FIG. 7a -1 and FIG. 7a -2 are a schematicdiagram of determining a super-resolution pixel block according to anembodiment of this application. As shown in FIG. 7a -1 and FIG. 7a -2, areference image frame of an image frame A is an image frame B, where b1is a coding block whose inter-frame prediction mode is the skip mode, b2is a coding block whose inter-frame prediction mode is the AMVP mode, b3is a coding block whose inter-frame prediction mode is the merge mode,and the coding blocks b1, b1, and b3 all belong to the image frame B. Asuper-resolution pixel block corresponding to b1 is pb1, asuper-resolution pixel block corresponding to b2 is pb2, and asuper-resolution pixel block corresponding to b3 is pb3. In FIG. 7a -2,a1 is a coding block whose inter-frame prediction mode is the skip mode,a2 is a coding block whose inter-frame prediction mode is the AMVP mode,a3 is a coding block whose inter-frame prediction mode is the mergemode, and the coding blocks a1, a1, and a3 all belong to the image frameA. A super-resolution pixel block corresponding to a1 is pa1, asuper-resolution pixel block corresponding to a2 is pa2, and asuper-resolution pixel block corresponding to a3 is pa3. A matchedcoding block of the coding block a1 is b1 in the image frame B, amatched coding block of the coding block a2 is b2 in the image frame B,and a matched coding block of the coding block a3 is b3 in the imageframe B. In FIG. 7a -2, because the inter-frame prediction mode of a1 isthe first-type prediction mode, pa1 is the same as pb1; and because theinter-frame prediction modes of a2 and a3 are the second-type predictionmode, super-resolution processing is separately performed on pixelinformation of a2 and a3, to obtain pa2 and pa3.

After obtaining the super-resolution pixel blocks of all the codingblocks that belong to the same image frame (the frame type of the imageframe is the P frame) in the video stream, the terminal device maystitch the super-resolution pixel blocks of all the coding blocks thatbelong to the same image frame, to obtain the super-resolution image.The super-resolution image is used for generating a super-resolutionvideo. Specifically, the encoded information of the any coding block mayfurther include an image frame index, a motion vector predictor MVPand/or a motion vector difference MVD, and the like. When theinter-frame prediction mode of the any coding block is the first-typeprediction mode (the skip mode), the MVP in the encoded information isused as a motion vector MV of the any coding block; and when theinter-frame prediction mode of the any coding block is the second-typeprediction mode (the merge mode or the AMVP mode), a sum of the MVP andthe MVD in the encoded information is used as the MV of the any codingblock, where the MVD in the merge mode is 0. The terminal device mayobtain a location of the matched coding block in the reference imageframe, and may determine, based on the MV and the location of thematched coding block in the reference image frame, a location of the anycoding block in an image frame (that is, an image frame identified bythe image frame index) to which the any coding block belongs. Theterminal device may stitch the super-resolution pixel blocks of all thecoding blocks that belong to the same image frame in the video streambased on locations of all the coding blocks in the image frame to whichall the coding blocks belong, to obtain the super-resolution image ofthe image frame to which all the coding blocks belong. An image framethat is earlier in a time sequence in the video stream may be used as areference image frame of an image frame that is later in the timesequence in the video stream. The terminal device may generate thesuper-resolution video based on super-resolution images of all the imageframes in the video stream. The terminal device uses differentsuper-resolution processing manners for coding blocks with differentinter-frame prediction modes, to implement super-resolution processingon an entire image frame, and reduce a calculation amount of thesuper-resolution model. Therefore, power consumption can be reduced on apremise that an effect of super-resolution processing performed on asingle frame of image in a video is ensured, and super-resolutionprocessing delays of any two frames of images in the video can beshortened.

For example, FIG. 7b is a schematic diagram of locations of codingblocks according to an embodiment of this application. As shown in FIG.7b , a coding block in an image frame is used as an example. It isassumed that an image frame to which a coding block dl belongs is animage frame D, an image frame to which a coding block cl belongs is animage frame C, a reference image frame of the image frame D is the imageframe C, and a matched coding block of the coding block dl is cl. InFIG. 7b , the image frames are placed in a two-dimensional coordinatesystem (an origin O, an x-axis, and a y-axis). It is assumed thatdisplacements of pixels in one coding block relative to the origin O arethe same. If a location of the coding block cl in the image frame C is alocation shown in FIG. 7b , the terminal device may map the location ofthe coding block cl in the image frame C to the image frame D, and thenmove the coding block cl in the image frame D based on a motion vectorMV (including a magnitude and a direction). In this case, a location ofthe coding block dl in the image frame D is obtained.

S206: If the frame type of the image frame to which the any coding blockin the video stream belongs is the I frame, refer to the image frame towhich the any coding block belongs as a first image frame, and obtainpixel information of the first image frame.

In some feasible implementations, if the frame type of the image frameto which the any coding block in the video stream belongs and that isobtained in step S202 is the I frame, the terminal device may obtain allcoding blocks that belong to the same image frame as the any codingblock, and may stitch the any coding block and all the coding blocksthat belong to the same image frame as the any coding block to obtainthe first image frame to which the any code belongs. The terminal devicemay obtain the pixel information of the first image frame, where thepixel information of the first image frame may include pixel informationof all the coding blocks that belong to the first image frame. Duringstitching, the coding blocks are stitched together based on locations ofthe coding blocks in the image frame, to obtain the image frame.

For example, FIG. 8 is a schematic diagram of coding block stitchingaccording to an embodiment of this application. In FIG. 8, it is assumedthat the any coding block is a coding block A, and coding blocks thatbelong to a same image frame as the coding block A include a codingblock B, a coding block C, and a coding block D. The terminal devicestitches the coding block A, the coding block B, the coding block C, andthe coding block D based on a location of each of the coding blocks inthe image frame, to obtain the first image frame.

S207: Perform super-resolution processing on the pixel information ofthe first image frame to obtain a super-resolution image of the firstimage frame.

In some feasible implementations, the terminal device may input thepixel information of the first image frame into the super-resolutionmodel for super-resolution processing, to obtain the super-resolutionimage output by the super-resolution model for the pixel information ofthe first image frame, where the super-resolution image may be used forgenerating a super-resolution video. The terminal device uses differentprocessing manners for image frames of different frame types in thevideo stream. For the P frame, the terminal device performssuper-resolution processing on each coding block in the P frame; and forthe I frame, the terminal device directly performs super-resolutionprocessing on a complete I-frame image. This improves the videosuper-resolution processing method, and ensures a super-resolutionprocessing effect of each image frame in the video stream.

S208: Combine super-resolution images of all image frames in the videostream into the super-resolution video.

In some feasible implementations, the video stream includes a pluralityof image frames; and the terminal device may obtain the super-resolutionimages of all the image frames in the video stream, and combine thesuper-resolution images of all the image frames in the video stream intothe super-resolution video for output.

In this embodiment of this application, when the frame type of the imageframe to which the any coding block in the video stream belongs is the Pframe, the terminal device determines the inter-frame prediction mode ofthe any coding block based on the inter-frame prediction marker in theencoded information of the any coding block. When the inter-frameprediction mode of the any coding block is the skip mode or the mergemode, the terminal device uses the matched super-resolution pixel blockof the matched coding block in the reference image as thesuper-resolution pixel block of the any coding block. When theinter-frame prediction mode of the any coding block is the AMVP mode,the terminal device performs super-resolution processing on the pixelinformation of the any coding block, to obtain the super-resolutionpixel block of the any coding block. Finally, the terminal devicestitches the super-resolution pixel blocks of all the coding blocks thatbelong to the same image frame in the video stream, to obtain thesuper-resolution image, where the super-resolution image is used forgenerating the super-resolution video. In this embodiment of thisapplication, different processing is performed on coding blocks based ondifferent inter-frame prediction modes in the P frame to obtainsuper-resolution pixel blocks, and the super-resolution pixel blocks arestitched to obtain a super-resolution image. This can reduce powerconsumption while ensuring an effect of super-resolution processingperformed on a single frame of image in a video, and shortensuper-resolution processing delays of any two frames of images in thevideo.

In an optional embodiment, the video super-resolution processing methodprovided in this embodiment of this application may be mainly applied toa P frame in a video stream. Therefore, for ease of understanding,internal implementation of super-resolution processing performed by aterminal device on a P frame in a video stream is briefly described byusing the P frame in the video stream as an example in this embodimentof this application. FIG. 9 is a schematic diagram of internalimplementation of super-resolution processing performed by a terminaldevice on a P frame according to an embodiment of this application.

Step 1: The terminal device extracts encoded information of a codingblock by using a third-party media codec service (3rd party media codecservice).

Step 2: If there is no pixel prediction residual in the encodedinformation, the terminal device obtains, by using an open graphicslibrary (OpenGL) texture tracker (OpenGL Texture Tracker), a resultobtained after super-resolution processing is performed on a matchedcoding block that is in a reference image frame and that has same pixelsas the coding block.

If there is no pixel prediction residual in the encoded information, itindicates that an inter-frame prediction mode of the coding block is askip mode (skip mode).

Step 3: If there is a pixel prediction residual in the encodedinformation, the terminal device performs super-resolution processing onthe coding block by using the OpenGL texture tracker and asuper-resolution renderer (Super Res Renderer).

If there is the pixel prediction residual in the encoded information, itindicates that the inter-frame prediction mode of the coding block is aninter-frame mode (AMVP mode) or a merge mode (merge mode). Whenperforming super-resolution processing on the coding block, the terminaldevice may not only input pixel information of the coding block into asuper-resolution model for super-resolution processing, but also extractan edge region of the coding block, and perform pixel padding on theedge region. Then, the terminal device inputs pixel information of thepost-pixel-padding edge region into the super-resolution model forsuper-resolution processing. Finally, the terminal device may stitch aresult obtained after super-resolution processing is performed on thecoding block and a result obtained after super-resolution processing isperformed on the post-pixel-padding edge region, to obtain asuper-resolution pixel block of the complete coding block.

Step 4: The terminal device performs converged stitching and renderingon results of step 2 and step 3 that are performed on a same video frame(or image frame) by using an image stitching technology (for example,ArcGIS Engine), to obtain a super-resolution image.

After the super-resolution image is obtained, the super-resolution imagemay be returned, by transmitting a video frame back to a pipeline, to avideo application for output and display.

In this embodiment of this application, in the P frame, when there is nopixel prediction residual in the encoded information of the codingblock, the terminal device directly uses a result obtained aftersuper-resolution processing is performed on the matched coding blockthat is in the reference image frame (an I frame) and that has the samepixels as the coding block as a result obtained after super-resolutionprocessing is performed on the coding block, and does not need toperform super-resolution processing on the coding block again.Therefore, power consumption generated during super-resolutionprocessing can be reduced, that is, power consumption of the terminaldevice is reduced. In addition, time for super-resolution processing ofa single frame of image can be shortened, and thus super-resolutionprocessing delays of any two frames of images can be shortened. Whenthere is the pixel prediction residual in the encoded information of thecoding block, the terminal device needs to perform super-resolutionprocessing on the coding block, so that a super-resolution effect of thecoding block can be ensured. Therefore, a super-resolution effect of asuper-resolution image can be ensured, where the super-resolution imageis obtained by stitching results obtained after super-resolutionprocessing is performed on all coding blocks in a same image frame.

The video super-resolution processing method in this embodiment of thisapplication is described above in detail. To help better implement theforegoing solutions in this embodiment of this application, anembodiment of this application further provides a correspondingapparatus and device.

FIG. 10 is a schematic structural diagram of a video super-resolutionprocessing apparatus according to an embodiment of this application. Asshown in FIG. 10, the video super-resolution processing apparatus 100includes:

a first obtaining module 101, configured to obtain encoded informationof any coding block in a video stream; a first determining module 102,configured to determine an inter-frame prediction mode of the any codingblock based on an inter-frame prediction marker included in the encodedinformation obtained by the first obtaining module 101; a seconddetermining module 103, configured to determine a super-resolution pixelblock of the any coding block based on the inter-frame prediction modedetermined by the first determining module 102 and pixel information ofa matched coding block, where the super-resolution pixel block is apixel block obtained after super-resolution processing is performed onpixel information of the any coding block; and a stitching module 104,configured to stitch super-resolution pixel blocks of all coding blocksthat belong to a same image frame in the video stream to obtain asuper-resolution image, where the super-resolution image is used forgenerating a super-resolution video. A coding block belongs to an imageframe in the video stream; one image frame in the video stream includesa plurality of coding blocks; the encoded information includes areference image frame index, a matched coding block index, and theinter-frame prediction marker; and the matched coding block identifiedby the matched coding block index is a coding block that is in areference image frame identified by the reference image frame index andthat has a minimum pixel difference from the any coding block.

In some feasible implementations, the pixel information of the anycoding block is the same as pixel information of the matched codingblock. The second determining module 103 may include a first determiningunit 1031. The first determining unit 1031 is configured to: when theinter-frame prediction mode determined by the first determining module102 is a first-type prediction mode, determine a matchedsuper-resolution pixel block of the matched coding block as thesuper-resolution pixel block of the any coding block, where the matchedsuper-resolution pixel block includes a pixel block obtained aftersuper-resolution processing is performed on the pixel information of thematched coding block.

In some feasible implementations, the second determining module 103 mayfurther include a second determining unit 1032 and a super-resolutionprocessing unit 1033. The second determining unit 1032 is configured to:when the inter-frame prediction mode determined by the first determiningmodule 102 is a second-type prediction mode, determine a pixelprediction residual of the any coding block, and determine the pixelinformation of the any coding block based on the pixel information ofthe matched coding block and the pixel prediction residual. Thesuper-resolution processing unit 1033 is configured to performsuper-resolution processing on the pixel information that is of the anycoding block and that is determined by the second determining unit, toobtain the super-resolution pixel block of the any coding block.

In some feasible implementations, the super-resolution processing unit1033 may be specifically configured to:

obtain a first pixel region in the any coding block; perform pixelpadding on the first pixel region, and perform super-resolutionprocessing on pixel information of the post-pixel-padding first pixelregion to obtain a first pixel block corresponding to the any codingblock; and perform super-resolution processing on the pixel informationof the any coding block, and stitch the first pixel block and a pixelblock obtained after super-resolution processing is performed on thepixel information of the any coding block, to obtain thesuper-resolution pixel block of the any coding block. The pixelinformation of the first pixel region is pixel information in the pixelinformation of the any coding block except pixel information of a secondpixel region, and the second pixel region is a pixel region of a presetsize in the any coding block.

In some feasible implementations, the super-resolution processing unit1033 may further be specifically configured to: determine the secondpixel region of the preset size in the any coding block, and determine,as the first pixel region, a region that is in the any coding block andthat does not overlap with the second pixel region.

In some feasible implementations, the encoded information may furtherinclude a motion vector predictor MVP and/or a motion vector differenceMVD. The stitching module 104 may be specifically configured to:determine a motion vector MV of the any coding block based on theinter-frame prediction mode, and the MVP and/or the MVD; determine,based on the MV and a location of the matched coding block in thereference image frame, a location of the any coding block in an imageframe to which the any coding block belongs; and stitch thesuper-resolution pixel blocks of all the coding blocks that belong tothe same image frame in the video stream based on locations of all thecoding blocks in the image frame to which all the coding blocks belong,to obtain a super-resolution image corresponding to the image frame towhich all the coding blocks belong.

In some feasible implementations, a frame type of the image frame towhich the any coding block belongs is a P frame.

In some feasible implementations, the video stream further includes afirst image frame whose frame type is an I frame. The apparatus 100further includes a second obtaining module 105 and a super-resolutionprocessing module 106. The second obtaining module 105 is configured toobtain pixel information of the first image frame. The super-resolutionprocessing module 106 is configured to perform super-resolutionprocessing on the pixel information that is of the first decoded imageframe and that is obtained by the second obtaining module, to obtain asuper-resolution image of the first decoded image frame.

In some feasible implementations, the apparatus 100 may further includea combination module 107. The combination module 107 is configured tocombine super-resolution images of all image frames in the video streaminto the super-resolution video.

The first obtaining module 101, the first determining module 102, thesecond determining module 103, the stitching module 104, the secondobtaining module 105, and the super-resolution processing module 106,and/or the combination module 107 may be one module, for example, aprocessing module. The first determining unit 1031 and the seconddetermining unit 1032, and/or the super-resolution processing unit 1033may also be one unit, for example, a processing unit.

In specific implementation, for implementation of each module and/orunit, reference may further made to corresponding descriptions of theterminal device in the method embodiment shown in FIG. 2 or FIG. 5, sothat the method and the functions performed by the terminal device inthe foregoing embodiment are performed.

In this embodiment of this application, when the frame type of the imageframe to which the any coding block in the video stream belongs is the Pframe, the video super-resolution processing apparatus determines theinter-frame prediction mode of the any coding block based on theinter-frame prediction marker in encoded information of the any codingblock. When the inter-frame prediction mode of the any coding block is askip mode or a merge mode, the video super-resolution processingapparatus uses the matched super-resolution pixel block of the matchedcoding block in the reference image as the super-resolution pixel blockof the any coding block; or when the inter-frame prediction mode of theany coding block is an AMVP mode, the video super-resolution processingapparatus performs super-resolution processing on the pixel informationof the any coding block, to obtain the super-resolution pixel block ofthe any coding block. Finally, the video super-resolution processingapparatus stitches the super-resolution pixel blocks of all the codingblocks that belong to the same image frame in the video stream, toobtain the super-resolution image, where the super-resolution image isused for generating the super-resolution video. In this embodiment ofthis application, different processing is performed on coding blocksbased on different inter-frame prediction modes in the P frame to obtainsuper-resolution pixel blocks, and the super-resolution pixel blocks arestitched to obtain a super-resolution image. This can reduce powerconsumption while ensuring an effect of super-resolution processingperformed on a single frame of image in a video, and shortensuper-resolution processing delays of any two frames of images in thevideo.

FIG. 11 is a schematic structural diagram of a terminal device accordingto an embodiment of this application. As shown in FIG. 11, the terminaldevice 1000 provided in this embodiment of this application includes aprocessor 1001, a memory 1002, a transceiver 1003, and a bus system1004.

The processor 1001, the memory 1002, and the transceiver 1003 areconnected by using the bus system 1004.

The memory 1002 is configured to store a program. Specifically, theprogram may include program code, and the program code includes computeroperation instructions. The memory 1002 includes but is not limited to arandom access memory (random access memory, RAM), a read-only memory(read-only memory, ROM), an erasable programmable read-only memory(erasable programmable read only memory, EPROM), or a portable read-onlymemory (compact disc read-only memory, CD-ROM). FIG. 11 shows only onememory. Certainly, a plurality of memories may also be disposed asrequired. Alternatively, the memory 1002 may be a memory in theprocessor 1001, which is not limited herein.

The memory 1002 stores the following elements, that is, an executablemodule, a unit or a data structure, or a subset thereof, or an extendedset thereof:

The operation instructions include various operation instructions, andare configured to implement various operations.

An operating system includes various system programs, and is configuredto implement various basic services and process hardware-based tasks.

The processor 1001 controls an operation of the terminal device 1000.The processor 1001 may be one or more central processing units (centralprocessing unit, CPU). When the processor 1001 is one CPU, the CPU maybe a single-core CPU, or may be a multi-core CPU.

During specific application, components of the terminal device 1000 arecoupled together by using the bus system 1004. In addition to a databus, the bus system 1004 may further include a power bus, a control bus,and a status signal bus. However, for clear description, various typesof buses in FIG. 11 are marked as the bus system 1004. For ease ofrepresentation, the terminal device is merely schematically illustratedin FIG. 11.

The method provided in FIG. 2, FIG. 5, or FIG. 9 in the foregoingembodiments of this application, or the method provided in any otherembodiment may be applied to the processor 1001, or implemented by theprocessor 1001. The processor 1001 may be an integrated circuit chip andhas a data processing capability. In an implementation process, steps inthe foregoing methods can be implemented by using a hardware integratedlogical circuit in the processor 1001, or by using instructions in aform of software. The processor 1001 may be a general-purpose processor,a digital signal processor (digital signal processing, DSP), anapplication-specific integrated circuit (application specific integratedcircuit, ASIC), a field-programmable gate array (field-programmable gatearray, FPGA) or another programmable logic device, a discrete gate ortransistor logic device, or a discrete hardware component, and mayimplement or perform the methods, the steps, and logical block diagramsthat are disclosed in the embodiments of this application. Thegeneral-purpose processor may be a microprocessor, or the processor maybe any conventional processor or the like. Steps of the methodsdisclosed with reference to the embodiments of this application may bedirectly executed and accomplished by a hardware decoding processor, ormay be executed and accomplished by using a combination of hardware andsoftware modules in the decoding processor. A software module may belocated in a mature storage medium in the art, such as a random accessmemory, a flash memory, a read-only memory, a programmable read-onlymemory, an electrically erasable programmable memory, or a register. Thestorage medium is located in the memory 1002. The processor 1001 readsdata in the memory 1002, and performs, by using hardware of theprocessor 1001, the method steps of the terminal device that aredescribed in FIG. 2, FIG. 5, or FIG. 9, or the method steps of theterminal device that are described in the foregoing embodiments.

The terminal device in this application includes but is not limited to asmartphone, a vehicle-mounted apparatus, a personal computer, anartificial intelligence device, a tablet computer, a personal digitalassistant, a smart wearable device (for example, a smart watch, a smartband, or smart glasses), a smart television (or referred to as a smartbig screen, a smart screen, a big screen TV, or the like), anintelligent voice device (such as a smart speaker), a virtualreality/mixed reality/enhanced display device, and the like.

An embodiment of this application further provides a computer programproduct, where the computer program product includes computer programcode, and when the computer program code is run on a computer, thecomputer is enabled to perform the method described in any one of theforegoing embodiments.

An embodiment of this application further provides a chip, including aprocessor. The processor is configured to read and execute a computerprogram stored in a memory, to perform the video super-resolutionprocessing method in any possible implementation in FIG. 2, FIG. 5, orFIG. 9. Optionally, the chip further includes the memory, and the memoryis connected to the processor by using a circuit or a wire. Furtheroptionally, the chip further includes a communications interface, andthe processor is connected to the communications interface. Thecommunications interface is configured to receive data and/orinformation that needs to be processed. The processor obtains the dataand/or information from the communications interface, processes the dataand/or information, and outputs a processing result through thecommunications interface. The communications interface may be aninput/output interface.

Optionally, the processor and the memory may be physically independentunits, or the memory may be integrated with the processor.

A person of ordinary skill in the art may understand that all or some ofthe processes of the methods in the embodiments may be implemented by acomputer program instructing related hardware. The program may be storedin a computer-readable storage medium. When the program runs, theprocesses of the methods in the embodiments may be performed. Theforegoing storage medium includes any medium that can store programcode, such as a ROM or a random access memory RAM, a magnetic disk, oran optical disc.

The foregoing descriptions are merely specific implementations of thisapplication, but are not intended to limit the protection scope of thisapplication. Any variation or replacement readily figured out by aperson skilled in the art within the technical scope disclosed in thisapplication shall fall within the protection scope of this application.Therefore, the protection scope of this application shall be subject tothe protection scope of the claims.

What is claimed is:
 1. A video super-resolution processing method,comprising: obtaining encoded information of any coding block in a videostream, wherein a coding block belongs to an image frame in the videostream; one image frame in the video stream comprises a plurality ofcoding blocks; the encoded information comprises a reference image frameindex, a matched coding block index, and an inter-frame predictionmarker; and a matched coding block identified by the matched codingblock index is a coding block that is in a reference image frameidentified by the reference image frame index and that has a minimumpixel difference from the any coding block; determining an inter-frameprediction mode of the any coding block based on the inter-frameprediction marker, and determining a super-resolution pixel block of theany coding block based on the inter-frame prediction mode and pixelinformation of the matched coding block, wherein the super-resolutionpixel block is a pixel block obtained after super-resolution processingis performed on pixel information of the any coding block; and stitchingsuper-resolution pixel blocks of all coding blocks that belong to a sameimage frame in the video stream to obtain a super-resolution image,wherein the super-resolution image is used for generating asuper-resolution video.
 2. The method according to claim 1, wherein thepixel information of the any coding block is the same as the pixelinformation of the matched coding block; and the determining asuper-resolution pixel block of the any coding block based on theinter-frame prediction mode and pixel information of the matched codingblock comprises: when the inter-frame prediction mode is a first-typeprediction mode, determining a matched super-resolution pixel block ofthe matched coding block as the super-resolution pixel block of the anycoding block, wherein the matched super-resolution pixel block comprisesa pixel block obtained after super-resolution processing is performed onthe pixel information of the matched coding block.
 3. The methodaccording to claim 1, wherein the determining a super-resolution pixelblock of the any coding block based on the inter-frame prediction modeand pixel information of the matched coding block comprises: when theinter-frame prediction mode is a second-type prediction mode,determining a pixel prediction residual of the any coding block, anddetermining the pixel information of the any coding block based on thepixel information of the matched coding block and the pixel predictionresidual; and performing super-resolution processing on the pixelinformation of the any coding block, to obtain the super-resolutionpixel block of the any coding block.
 4. The method according to claim 3,wherein the performing super-resolution processing on the pixelinformation of the any coding block, to obtain the super-resolutionpixel block of the any coding block comprises: obtaining a first pixelregion in the any coding block, wherein pixel information of the firstpixel region is pixel information in the pixel information of the anycoding block except pixel information of a second pixel region, and thesecond pixel region is a pixel region of a preset size in the any codingblock; performing pixel padding on the first pixel region, andperforming super-resolution processing on the pixel information of thepost-pixel-padding first pixel region to obtain a first pixel blockcorresponding to the any coding block; and performing super-resolutionprocessing on the pixel information of the any coding block, andstitching the first pixel block and a pixel block obtained aftersuper-resolution processing is performed on the pixel information of theany coding block, to obtain the super-resolution pixel block of the anycoding block.
 5. The method according to claim 4, wherein the obtaininga first pixel region in the any coding block comprises: determining thesecond pixel region of the preset size in the any coding block, anddetermining, as the first pixel region, a region that is in the anycoding block and that does not overlap with the second pixel region. 6.The method according to claim 1, wherein the encoded information furthercomprises a motion vector predictor MVP and/or a motion vectordifference MVD; and the stitching super-resolution pixel blocks of allcoding blocks that belong to a same image frame in the video stream toobtain a super-resolution image comprises: determining a motion vectorMV of the any coding block based on the inter-frame prediction mode, andthe MVP and/or the MVD; determining, based on the MV and a location ofthe matched coding block in the reference image frame, a location of theany coding block in an image frame to which the any coding blockbelongs; and stitching the super-resolution pixel blocks of all thecoding blocks that belong to the same image frame in the video streambased on locations of all the coding blocks in the image frame to whichall the coding blocks belong, to obtain the super-resolution image ofthe image frame to which all the coding blocks belong.
 7. The methodaccording to claim 1, wherein a frame type of an image frame to whichthe any coding block belongs is a P frame.
 8. A terminal device,comprising a processor and a memory, wherein the memory is configured tostore a computer program; the computer program comprises programinstructions; and when the processor runs the program instructions, theterminal device is enabled to perform: obtaining encoded information ofany coding block in a video stream, wherein a coding block belongs to animage frame in the video stream; one image frame in the video streamcomprises a plurality of coding blocks; the encoded informationcomprises a reference image frame index, a matched coding block index,and an inter-frame prediction marker; and a matched coding blockidentified by the matched coding block index is a coding block that isin a reference image frame identified by the reference image frame indexand that has a minimum pixel difference from the any coding block;determining an inter-frame prediction mode of the any coding block basedon the inter-frame prediction marker, and determining a super-resolutionpixel block of the any coding block based on the inter-frame predictionmode and pixel information of the matched coding block, wherein thesuper-resolution pixel block is a pixel block obtained aftersuper-resolution processing is performed on pixel information of the anycoding block; and stitching super-resolution pixel blocks of all codingblocks that belong to a same image frame in the video stream to obtain asuper-resolution image, wherein the super-resolution image is used forgenerating a super-resolution video.
 9. The terminal device according toclaim 8, wherein the pixel information of the any coding block is thesame as the pixel information of the matched coding block; and thedetermining a super-resolution pixel block of the any coding block basedon the inter-frame prediction mode and pixel information of the matchedcoding block comprises: when the inter-frame prediction mode is afirst-type prediction mode, determining a matched super-resolution pixelblock of the matched coding block as the super-resolution pixel block ofthe any coding block, wherein the matched super-resolution pixel blockcomprises a pixel block obtained after super-resolution processing isperformed on the pixel information of the matched coding block.
 10. Theterminal device according to claim 8, wherein the determining asuper-resolution pixel block of the any coding block based on theinter-frame prediction mode and pixel information of the matched codingblock comprises: when the inter-frame prediction mode is a second-typeprediction mode, determining a pixel prediction residual of the anycoding block, and determining the pixel information of the any codingblock based on the pixel information of the matched coding block and thepixel prediction residual; and performing super-resolution processing onthe pixel information of the any coding block, to obtain thesuper-resolution pixel block of the any coding block.
 11. The terminaldevice according to claim 10, wherein the performing super-resolutionprocessing on the pixel information of the any coding block, to obtainthe super-resolution pixel block of the any coding block comprises:obtaining a first pixel region in the any coding block, wherein pixelinformation of the first pixel region is pixel information in the pixelinformation of the any coding block except pixel information of a secondpixel region, and the second pixel region is a pixel region of a presetsize in the any coding block; performing pixel padding on the firstpixel region, and performing super-resolution processing on the pixelinformation of the post-pixel-padding first pixel region to obtain afirst pixel block corresponding to the any coding block; and performingsuper-resolution processing on the pixel information of the any codingblock, and stitching the first pixel block and a pixel block obtainedafter super-resolution processing is performed on the pixel informationof the any coding block, to obtain the super-resolution pixel block ofthe any coding block.
 12. The terminal device according to claim 11,wherein the obtaining a first pixel region in the any coding blockcomprises: determining the second pixel region of the preset size in theany coding block, and determining, as the first pixel region, a regionthat is in the any coding block and that does not overlap with thesecond pixel region.
 13. The terminal device according to claim 8,wherein the encoded information further comprises a motion vectorpredictor MVP and/or a motion vector difference MVD; and the stitchingsuper-resolution pixel blocks of all coding blocks that belong to a sameimage frame in the video stream to obtain a super-resolution imagecomprises: determining a motion vector MV of the any coding block basedon the inter-frame prediction mode, and the MVP and/or the MVD;determining, based on the MV and a location of the matched coding blockin the reference image frame, a location of the any coding block in animage frame to which the any coding block belongs; and stitching thesuper-resolution pixel blocks of all the coding blocks that belong tothe same image frame in the video stream based on locations of all thecoding blocks in the image frame to which all the coding blocks belong,to obtain the super-resolution image of the image frame to which all thecoding blocks belong.
 14. The terminal device according to claim 8,wherein a frame type of an image frame to which the any coding blockbelongs is a P frame.
 15. A non-transitory computer-readable storagemedium, wherein the computer-readable storage medium stores computerprogram instructions; and when the computer program instructions are runon a computer, the computer is enabled to perform: obtaining encodedinformation of any coding block in a video stream, wherein a codingblock belongs to an image frame in the video stream; one image frame inthe video stream comprises a plurality of coding blocks; the encodedinformation comprises a reference image frame index, a matched codingblock index, and an inter-frame prediction marker; and a matched codingblock identified by the matched coding block index is a coding blockthat is in a reference image frame identified by the reference imageframe index and that has a minimum pixel difference from the any codingblock; determining an inter-frame prediction mode of the any codingblock based on the inter-frame prediction marker, and determining asuper-resolution pixel block of the any coding block based on theinter-frame prediction mode and pixel information of the matched codingblock, wherein the super-resolution pixel block is a pixel blockobtained after super-resolution processing is performed on pixelinformation of the any coding block; and stitching super-resolutionpixel blocks of all coding blocks that belong to a same image frame inthe video stream to obtain a super-resolution image, wherein thesuper-resolution image is used for generating a super-resolution video.16. The non-transitory computer-readable storage medium according toclaim 15, wherein the pixel information of the any coding block is thesame as the pixel information of the matched coding block; and thedetermining a super-resolution pixel block of the any coding block basedon the inter-frame prediction mode and pixel information of the matchedcoding block comprises: when the inter-frame prediction mode is afirst-type prediction mode, determining a matched super-resolution pixelblock of the matched coding block as the super-resolution pixel block ofthe any coding block, wherein the matched super-resolution pixel blockcomprises a pixel block obtained after super-resolution processing isperformed on the pixel information of the matched coding block.
 17. Thenon-transitory computer-readable storage medium according to claim 15,wherein the determining a super-resolution pixel block of the any codingblock based on the inter-frame prediction mode and pixel information ofthe matched coding block comprises: when the inter-frame prediction modeis a second-type prediction mode, determining a pixel predictionresidual of the any coding block, and determining the pixel informationof the any coding block based on the pixel information of the matchedcoding block and the pixel prediction residual; and performingsuper-resolution processing on the pixel information of the any codingblock, to obtain the super-resolution pixel block of the any codingblock.
 18. The non-transitory computer-readable storage medium accordingto claim 17, wherein the performing super-resolution processing on thepixel information of the any coding block, to obtain thesuper-resolution pixel block of the any coding block comprises:obtaining a first pixel region in the any coding block, wherein pixelinformation of the first pixel region is pixel information in the pixelinformation of the any coding block except pixel information of a secondpixel region, and the second pixel region is a pixel region of a presetsize in the any coding block; performing pixel padding on the firstpixel region, and performing super-resolution processing on the pixelinformation of the post-pixel-padding first pixel region to obtain afirst pixel block corresponding to the any coding block; and performingsuper-resolution processing on the pixel information of the any codingblock, and stitching the first pixel block and a pixel block obtainedafter super-resolution processing is performed on the pixel informationof the any coding block, to obtain the super-resolution pixel block ofthe any coding block.
 19. The non-transitory computer-readable storagemedium according to claim 17, wherein the obtaining a first pixel regionin the any coding block comprises: determining the second pixel regionof the preset size in the any coding block, and determining, as thefirst pixel region, a region that is in the any coding block and thatdoes not overlap with the second pixel region.
 20. The non-transitorycomputer-readable storage medium according to claim 15, wherein theencoded information further comprises a motion vector predictor MVPand/or a motion vector difference MVD; and the stitchingsuper-resolution pixel blocks of all coding blocks that belong to a sameimage frame in the video stream to obtain a super-resolution imagecomprises: determining a motion vector MV of the any coding block basedon the inter-frame prediction mode, and the MVP and/or the MVD;determining, based on the MV and a location of the matched coding blockin the reference image frame, a location of the any coding block in animage frame to which the any coding block belongs; and stitching thesuper-resolution pixel blocks of all the coding blocks that belong tothe same image frame in the video stream based on locations of all thecoding blocks in the image frame to which all the coding blocks belong,to obtain the super-resolution image of the image frame to which all thecoding blocks belong.